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Abstract: The aim of this paper is to develop a method to estimate high order FIR and ARX 
models using least squares with re-weighted nuclear norm regularization. Typically, the choice of 
the tuning parameter in the reweighting scheme is computationally expensive, hence we propose 
the use of the SPARSEVA (SPARSe Estimation based on a VAlidation criterion) framework to 
overcome this problem. Furthermore, we suggest the use of the prediction error criterion (PEC) 
to select the tuning parameter in the SPARSEVA algorithm. Numerical examples demonstrate 
the veracity of this method which has close ties with the traditional technique of cross validation, 
but using much less computations. 
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1. INTRODUCTION 

In the last few decades, utilizing regularization in es¬ 
timating transfer functions has received much attention 
(Pillonetto et al. (2014)). The basic idea is to add a penalty 
on the weighted size of the parameters to the least square 
cost function. There are several different regularization 
schemes that have been examined. For example, Tikhonov 
regularization (or ridge regression) uses a weighted I 2 norm 
as the penalty function to solve ill-posed problems in 
Tikhonov and Arsenin (1977). More recently, an important 
contribution to this field is from Pillonetto and De Nicolao 
(2010), where estimation methods based on Gaussian pro¬ 
cesses and spline kernels were examined. This approach 
has shown to have close ties with I 2 regularization in 
Chen et al. (2012). Another purpose of regularization is 
to obtain a sparse estimate. One of the early contributions 
to this approach is the LASSO (Least Absolute Shrinkage 
and Selection Operation) that minimizes the least square 
cost function under a constraint on the li norm of the 
parameter vector, which could also be interpreted as /i 
regularization in Tibshirani (1996). The idea behind of 
the LASSO is to use the h norm as a surrogate for the 
Iq norm. 

Recently, the nuclear norm has been used in this approach 
and shown to be a useful method for identifying low 
order linear models. It is first suggested to be used in 
sparse estimation as a rank minimization heuristic in Fazel 
et al. (2001). After that, Grossmann et al. (2009a,b) have 
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proposed a system identification method to fit a flexible 
FIR model to measured data and at the same time penalize 
model order complexity by minimizing the nuclear norm 
of the corresponding Hankel matrix. An advantage of 
this approach compared to classical system identification 
methods is that the corresponding optimization problem 
is convex. The trade-off between fit and complexity is 
determined by a design parameter, which is typically 
found by cross-validation. This has been extended to ARX 
models by Hjalmarsson et al. (2012). 

As the nuclear norm is an approximation of the rank 
function, by using a singular value weighting, this approx¬ 
imation can be further improved (Fazel et al. (2003),Mo¬ 
han and Fazel (2010)). The objective of this paper is to 
study the use of re-weighting for the nuclear norm system 
identification problem by developing a method to estimate 
high order FIR and ARX models using least squares with 
re-weighted nuclear norm regularization. 

The choice of the tuning parameter will then, however, 
be more involved as the reweighted nuclear norm regular¬ 
ization is an iterative process. To overcome this problem 
we propose the use of the SPARSEVA framework, which 
eliminates the need for cross-validation to tune the regular¬ 
ization parameter (Rojas and Hjalmarsson (2011); Rojas 
et al. (2014)). The SPARSEVA is originally motivated by 
classical structure selection criteria, such as Akaike Infor¬ 
mation Criteria (AIC) and Bayesian Information Criteria 
(BIG). Here, we propose the use of the prediction error 
criterion, which leads to the SPARSEVA - PEC method. 



The paper is organized as follows. Section 2 describes the 
system identification problem. Section 3 establishes the 
proposed reweighted nuclear norm regularization method 
and discusses the use of SPARSEVA. Section 4 suggests 
an alternative way to choose the regularization parameter 
in the SPARSEVA method. In section 5, some numerical 
experiments and results are presented. Einally, section 6 
provides the conclusions. 


2. SYSTEM IDENTIEICATION PROBLEM 


Consider a discrete-time linear, time invariant, single input 
single output stable system, 

y{t) = Go{q)uit) + Ho{q)e{t), (1) 

where u{t) is the input signal and e{t) is zero mean white 
noise, independent of the input. 

Given past data {y{t),u{t)}, t = the task is to 

estimate a model of the transfer function Go{q)- A typical 
approach is to apply the Prediction Error Method (PEM), 
however, PEM can have problems with local minima, e.g. 
when a rational transfer function model is estimated using 
the Output Error (OE) model. The advantage of the PEM 
is that only a few parameters need to be estimated. 

Another common and simple approach to estimate Go{q) 
in (1) is to use the Einite Impulse Response (FIR) model, 

n 

G{q) = Y.gm-\ (2) 

k=l 

where n is the order of the FIR model. The vector 9 = 
[ 51 , 32 , can then be estimated using least squares 

by solving the linear regression problem corresponding to 
the model, 

y{t) = (p{t)'^9, (3) 

with 

ip{t) = [u{t-l), (4) 


However, as n increases, the variance of the estimate 
will also increase. Regularization can then be used to 
overcome this issue (Ljung et al. (1991))(Chen et al. 
(2012)). As mentioned in the introduction, there are many 
regularization schemes where in this paper, we will only 
focus on the nuclear norm regularization. 


Consider the Hankel matrix T-L{9) formulated from 9, 


n{9) 


91 92 ■■■ 9m 

92 93 ■ ■ ■ 9m+l 


LQm 9m-\-l • • • 9n J 


( 5 ) 


where m = (n -I- l)/2 (n should be chosen as an odd 
number). For a linear system with impulse responses 
9 = [ 91 , 92 , ■■■, 9nf^, the rank of the Hankel matrix in (5) 
equals to the order of the linear system (Kailath (1980)). 
Therefore, the Hankel matrix rank minimization is often 
used as a important approach in model order reduction 
problems, of which a common surrogate heuristic for the 
rank function is the nuclear norm. 


The nuclear norm for a matrix X G is defined by 

min{n,m) 

ii^iu= E (6) 


where ai{X) are the singular values of X. Note that the 
nuclear norm gives a measure of the matrix rank and is 
equal to this rank if the singular values equal to 1 and 0. 


The nuclear norm regularization algorithm can then be 
written in the following form by semi-definite program 
(SDP) (Grossmann et al. (2009b)): 


mm 

ff,Wl,W2 


s.t. 


N 

E 


(gii) - + ^Tr(lEi -f W 2 ) 


'Wi 

n^{9) 


n{9) 

W 2 


( 7 ) 


> 0 . 


The regularization parameter A can be chosen using cross 
validation, where the data is split into two parts, namely 
the estimation part and the validation part. The model 
is then estimated from the estimation data using different 
values of A. The estimates are then validated using the 
validation data; the value of A that gives the smallest sum 
square error will be chosen. The model can then be re- 
estimated using the whole dataset with this chosen value 
of A. 


In addition, another extension of the nuclear norm regu¬ 
larization (Hjalmarsson et al. (2012)) is to use the ARX 
model. 




AM 


'(i), 


( 8 ) 


with 

B{q) = biq ^-|-...-f "®, 


■^{q) — 1 + ^ -f ... -I- an^q 


where ua is the order of the ARX model. 


As shown by Ljung and Wahlberg (1992), when ua in¬ 
creases, Go{q) can be well approximated by B{q)/A{q) and 
iJo)?) can be approximated by 1/A{q). As demonstrated in 
Hjalmarsson et al. (2012), using a nuclear norm relaxation 
can allow the high order estimated model to have a similar 
accuracy to a low order model; and the performance of this 
method is competitive with respect to the prediction error 
method. 


For an ARX model, the linear regression problem to 
estimate the system parameters can be formulated as 

A{q)v{t) = B{q)u{t) + e{t), (9) 

which is the same as 

riA riB 

y(t) = - E ^ky(t - A:) + E ^ku(t - k) + e(t). (10) 

fc=l k=l 


By using nuclear norm regularization, the algorithm to 
estimate the parameter vector 9 is as follows: 

N 2 

min W (y(t) — (p^9\ 

e,Wlc.,W2a,Wu,W2b ^ V / 

k—\ 

+ y Tr(ITi, + IE2a) + y Tr(VEi, + IT26) 


'Wia 

nia) 

9{^(a) 

W2a 

Wib 

n{b) 

n^{b) 

W2b 


( 11 ) 

where 



^(t) = {-y{t - 1) - y{t - ^a) u{t - 1) ... u{t - ns)]^, 

0 — [ftl ... ••• j 

a=[ai ... b = [bi ... . 

Another contribution to the nuclear norm regularization 
was recently suggested by Blomberg et al. (2014, 2015). 
Here, a new methodology that utilizes the regularization 
path to approximately calculate all solutions of the nuclear 
norm regularization optimization problem to a certain 
tolerance of the model reduction problem as a function 
of the design parameter. 


3. REWEIGHTED NUCLEAR NORM 
REGULARIZATION WITH SPARSEVA 


The reweighted nuclear norm was developed by Mohan 
and Fazel (2010) and Fazel et al. (2003) for solving the 
rank minimization problem. The idea is to use the log- 
det heuristic to approximate the rank function. Based on 
this, we develop a regularization method that utilizes the 
reweighted nuclear norm to make a further improvement 
on the nuclear norm regularization. 

Consider the linear regression problem, 

Yn = + e, (12) 

where is the parameter vector to be estimated. 


The regularization method using the reweighted nuclear 
norm for the FIR model can be written as (Mohan and 
Fazel (2010)), 


mm 

0,Wl,W2 


s.t. 


I Yn ■ 


^ logdet(IUi + SI) 

+ ^ logdet(IU 2 + SI) 

'lUi •H(6»)'' 

'U'T h 


-<^10 ||2 


(13) 


where d is a small additional regularization constant. And 
A is the regularized parameter. 


The previous convex optimization can be solved locally 
using an iterative process, where the step is to solve 
the following problem (Mohan and Fazel (2010)), 


mm 

e,Wi,W2 


s.t. 


Yn - 

+ \Yr{W!^ + 5I) 


12 + ^Tr(IU'=+<M) 


-1 


lUi 


-1 


IU 2 


(14) 


Wi 


n{e) 

IU 2 


> 0 


However, it is non-trivial to update A using cross validation 
technique in every iteration. One way to avoid this problem 
is to use the SPARSEVA method as developed by Rojas 
and Hjalmarsson (2011); Rojas et al. (2014). This method 
essentially removes the use of the regularization parameter 
A by introducing a constant cat which is based on some 
validation criterion. Specifically, SPARSEVA replaces the 
traditional Ip norm regularization criterion 


min \\Yn-<I>% 0\\1 

u 

s.t. II 6» lip < ?7 


by the criterion 

min II e lip 

' . (16) 
s.t. Vn{ 6 ) < Vn{ 6 ls){^ + Cat) 

where || . ||p is the Ip norm, Vm{9) =|| Yjq — H^, 

Sls = (4>Af4>^)“^<I’ArVv and cat > 0. 


As suggested in the existing SPARSEVA method, can 
be chosen as 2n/N or log(A^)n/iV based on AIC and BIG 
respectively (Rojas et al. (2014)). 


Finally, by using the SPARSEVA method, the implemen¬ 
tation of the reweighted nuclear norm can be written as, 

min Tt{W^+ dI)-^Wi+ Tt{W^ + 6 I)-^W 2 
0,Wi,W2 


s.t. 


Wi 

n'^ie) 


no) 

IU 2 


> 0 


VAr(d) < VAr(l + Cat). 


(17) 


4. THE SPARSEVA-PEC 


For the SPARSEVA method, the key question is how to 
choose the regularization parameter cat. In this section, we 
will describe an alternative to select the value of cat as to 
that proposed in Rojas et al. (2014). 

In general, we want to choose cat so that the true param¬ 
eter vector of the linear regression (12), 9q, belongs to the 
set Vn{9o) < P/v(0Ls)(lTew) with a high probability. We 
also need this set to be as small as possible in order to 
guarantee a good fit to the data, e.g. cm needs to be small. 


We use the results from Sec. 11.5 in Soderstrom and Stoica 
(1989) to choose cm- From Eq. (11.53) in Soderstrom and 
Stoica (1989), we have: 


VNiOo) 


(18) 


Vn{0ls) 

(1 — n/N) 

The right hand side of Eq. (18) is equal to an unbiased 
estimate of the true prediction error variance a^. Eq. (18) 
suggests a way to choose cat, that is to set 


Vn{0ls){^ + ^n) 


Vn{0ls) 

{1-n/NY 


which means. 


1 


« = (T^v' 

Note that for small n/N, cat ~ We will denote this 
choice of Cat by the PEG. In the numerical demonstration 
of this paper, we use the exact value 1/(1 — n/N) x n/N 
for Cat- 


Moreover, if we instead evaluate 9ls on future fresh data, 
the parsimony principle leads to the final prediction error 
(FPE) criterion, 

as mentioned in Eq. (11.54) in Soderstrom and Stoica 
(1989). Hence, in this case, 

1 2n 2n 

^ {l-n/N)~N IV’ *' ’ 

which corresponds to the AIG. We can see that the SPAR¬ 
SEVA - PEG gives the smallest set while the SPARSEVA 



- AIC set is slightly larger and the SPARSEVA - BIC 
(cat = log(A^)n/A^) set is the largest one. 

A further understanding of these design choices can be 
interpreted as following. From Soderstrom and Stoica 
(1989), it is shown that 

N[Vn{0o) — VAf(^Ls)] 

[Vn{0ls)/{ 1 - n/N)] 

is X 2 distributed with n degrees of freedom. Therefore, 
in general, given a probability a, there exists a value 
of Cat so that the probability that 6*0 belongs to the set 
Vn{0o) < VAr(0Ls)(l + CAf) is a. 

Note that this set is indeed a confidence set for 6q, and 
could be obtained from the result that VnIOls — 0o\ 
is Gaussian distributed with zero mean and co-variance 
matrix cTgFor the FIR case with determin¬ 
istic regression vector, this is exact, but for ARX it is 
asymptotic. Note again that VAr(0Ls)/(l — n/N), which 
corresponds to the choice of PEC, is an unbiased estimate 
of the variance a'/ when e is white noise. 

In summary, we can see that the BIC is asymptotically 
consistent while the AIC typically gives a too small set. 
The PEC gives an even smaller set that gives more trust 
to the least squares estimate. However, particularly when 
we are not in the asymptotic (large N) regime, the PEC 
is a good choice as confirmed by the numerical example in 
Sec. 5. 

5. NUMERICAL RESULTS 

In this section, numerical examples are presented to 
demonstrate the effectiveness of the proposed method. 
In Sec. 5.1, we will consider the case when the output 
disturbance is white noise whilst in Sec. 5.2, a coloured 
noise output disturbance will be considered. 


where gk is the impulse response of the real system and 
gk is the impulse response of the estimated system and 
mean(gfe) = l/nJ2l=i9k- 

The four methods that we compare are: 

(1) The nuclear norm regularization for an FIR model 
with cross validation (CV-FIR-N). 

(2) The nuclear norm regularization for an FIR model 
with SPARSEVA-PEC (SPe-FIR-N). 

(3) The reweighted nuclear norm regularization for an 
FIR model with SPARSEVA-PEC (SPe-FIR-RN). 

(4) The prediction error method (PEM) that uses the 
output-error model where it is assumed that the 
order of the system is already known. The PEM 
estimates are generated by the command oe from the 
MATLAB"'"'^ System Identification Toolbox (R2014b). 

The results of the experiment are provided in Table 1. 

Table 1. Average model hts of 150 randomly 
systems with white noise disturbance 


Cramer Rao 

CV-FIR-N 

SPe-FIR-N 

SPe-FIR-RN 

PEM 

90% 

88.79 

86.93 

87.94 

89.10 

77% 

76.39 

74.43 

75.84 

74.90 

68% 

68.63 

67.05 

67.64 

64.61 

55% 

59.81 

59.23 

58.48 

48.15 


Boxplots of fit scores from the experiment with 4 different 
noise level are displayed in Figs. 1 to 4. 



5.1 Output disturbance is white noise 


Fig. 1. Boxplot for Cramer-Rao bound of 90%. 


We conduct an experiment similar to the experiment in 
Hjalmarsson et al. (2012), i.e., 150 systems with order 
varying from 1 to 10 are generated using the command 
drss from Matlab. All the systems will have poles with 
magnitude less than 0.9. White noise is added to the 
system output at 4 different noise levels corresponding 
to the Cramer-Rao bound of the model ht of 90%, 77%, 
68% and 55%. For each noise level, the systems are 
estimated with three different input excitation signal and 
output noise realizations. The input signals have a low- 
pass characteristic, and is generated by filtering zero mean 
Gaussian white noise with unit variance through the filter 


Fu{q) = 


0.436 

l-0.9q-i' 


The sample size in each run is N = 450 and the number 
of parameters n of the FIR model is 35. 


The model fit is recorded for each system and then 
compared between several methods. The model fit is 
dehned as 


W = 100 



' Ek=i(9k-gk)^ 

.ELi(5fc -niean(5fc))2 



( 21 ) 
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Fig. 2. Boxplot for Cramer-Rao bound of 77%. 

From the above results, we can see that the SPe-FIR- 
N and SPe-FIR-RN methods show a competitive model 
fit when compared to the CV-FIR-N and PEM methods. 
In addition, the SPe-FIR-N shows a smaller number of 
bad estimates (outliers) than the CV-FIR-N and PEM 
methods. The SPe-FIR-RN shows an improvement on 
model fits with respect to the SPe-FIR-N. 

We also compute the average computational time of each 
estimation for the 4 methods. In this experiment, we run 
randomly 50 systems for 4 different noise levels, each 
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Fig. 3. Boxplot for Cramer-Rao bound of 68%. 



Fig. 4. Boxplot for Cramer-Rao bound of 55%. 

system is estimated with one set of input and output data. 
The setting for the MATLAB fminsearch command to 
search for the regularized parameter in the cross validation 
method is optimset(‘TolX’,0.1,'TolFun’,le-4). The 
YALMIP toolbox by Lofberg (2004) is used to implement 
all the regularization methods. This average computa¬ 
tional time is presented in Table 2. 

Table 2. Average computational time of the 4 
methods (in seconds) 


CV-FIR-N 

SPe-FIR-N 

SPe-FIR-RN 

PEM 

251.12 

12.88 

145.88 

1.14 


Table 2 shows that the computational time of the 
SPARSEVA-PEC regularization methods is much faster 
than the cross validation methods, which was expected. 

5.2 Coloured noise output disturbance 

The experiment setting is same as in Sec. 5.1. The differ¬ 
ence here is that in this experiment, we use a coloured noise 
output disturbance. The noise model has the same order 
as the system and is also generated by the drss command 
from Matlab with the same constraints on the poles. 

In this case, we will use both FIR and ARX models to 
estimate the transfer function. The sample size in each 
run is = 450 and the number of parameters of the FIR 
model and ARX model is n = 35 and ha = riB = 35. 

The six methods that we compare are: 

(1) The nuclear norm regularization for FIR model with 
cross validation (CV-FIR-N). 

(2) The nuclear norm regularization for ARX model with 
cross validation (CV-ARX-N). 

(3) The nuclear norm regularization for FIR model with 
SPARSEVA-PEC (SPe-EIR-N). 

(4) The reweighted nuclear norm regularization for FIR 
model with SPARSEVA-PEC (SPe-EIR-RN). 

(5) The nuclear norm regularization for ARX model with 
SPARSEVA-PEC (SPe-ARX-N). 


(6) The prediction error method (PEM) that uses the 
output-error model where it is assumed that the order 
of the system is already known. 

The results of the six numerical experiments are provided 
in Table 3: 

Table 3. Average model fits of 150 randomly 
systems with coloured noise disturbance 


Cramer 

Rao 

CV- 

FIR- 

N 

CV- 

ARX- 

N 

SPe- 

FIR- 

N 

SPe- 

FIR- 

RN 

SPe- 

ARX- 

N 

PEM 

90% 

85.94 

86.01 

83.32 

85.01 

84.73 

88.18 

77% 

74.42 

73.00 

71.41 

71.79 

73.21 

71.47 

68% 

64.41 

64.00 

63.11 

61.83 

65.59 

60.07 

55% 

52.39 

50.86 

52.65 

51.17 

58.27 

42.84 


Boxplots of fit scores from the experiment with 4 different 
noise levels are displayed in Figs. 5 to 8. 



Fig. 5. Boxplot for Cramer-Rao bound of 90%. 



Fig. 6. Boxplot for Cramer-Rao bound of 77%. 



Fig. 7. Boxplot for Cramer-Rao bound of 68%. 

From the above results, we can see that the SPARSEVA- 
PEC methods with an FIR or an ARX model show a 






















































Fig. 8. Boxplot for Cramer-Rao bound of 55%. 

very good performance as compared to the cross valida¬ 
tion methods and PEM. They especially perform better 
and have less bad estimates (outliers) when the SNR is 
low. Among all the SPARSEVA-PEC methods, the ARX 
structure gives better results than the FIR structure. 

The average computational time of each estimation for 
the 6 methods is also computed. The setting is the same 
as in the case of white noise disturbance. This average 
computational time is presented in Table 4. 


Table 4. Average computational time of the 6 
methods (in seconds) 


CV- 

FIR- 

N 

CV- 

ARX- 

N 

SPe- 

FIR- 

N 

SPe- 

FIR- 

RN 

SPe- 

ARX- 

N 

PEM 

182.90 

1152.91 

13.19 

142.89 

27.79 

0.83 


The computational time of the SPARSEVA-PEC methods 
is again much faster than the cross validation methods. 

6. CONCLUSIONS 

In this paper, a new method is developed to estimate a 
discrete time system using a high order FIR or ARX model 
with re-weighted nuclear norm regularization. The SPAR- 
SEVA framework is employed to avoid the computation¬ 
ally expensive searching of the tuning parameter in the re¬ 
weighted nuclear norm regularization problem. The paper 
also suggests the use of the PEC criterion for selecting the 
regularization parameter in the SPARSEVA framework. 
The numerical results show that the SPARSEVA-PEC 
nuclear norm regularization is competitive with respect 
to the accuracy of traditional technique of cross validation 
however the computational time is at least ten times faster. 
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